ON THE USE OF POLICY ITERATION AS AN EASY WAY OF 
PRICING AMERICAN OPTIONS 



C. REISINGER AND J. H. WITTE 

Abstract. In this paper, we demonstrate that policy iteration, introduced in 
the context of HJB equations in [lOJ. is an extremely simple generic algorithm 
for solving linear complementarity problems resulting from the finite differ- 
ence and finite element approximation of American options. We show that, 
in general, 0{N) is an upper and lower bound on the number of iterations 
needed to solve a discrete LCP of size A'^. If embedded in a class of standard 
discretisations with M time steps, the overall complexity of American option 
pricing is indeed only 0{N{M + A'^)), and, therefore, for A/ ~ N, identical to 
the pricing of European options, which is 0{MN). We also discuss the numer- 
ical properties and robustness with respect to model parameters in relation to 
penalty and projected relaxation methods. 
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1. Introduction 

An American option is a financial instrument that gives its buyer the right, but 
not the obhgation, to buy (or sch) an asset at an agreed price at any time up to a 
certain time T. When working in a partial differential equation (PDE) framework, 
the option value V = V{t, S), where t e [0, T] and S S K"*" denote time and value of 
the underlying stock, respectively, is usually (e.g. cf. [7||3D]) the solution of a linear 
complementarity problem (LCP) 

CV > 0, 
V > P 

and CV ■{V~P) = 

with terminal condition V{T, S) — P{S), where £ is a linear (parabolic) differential 
operator and P — P{S) denotes the payoff of the option. Furthermore, if a fully 
implicit or weighted time-stepping scheme is applied to the operator C in the above 
LCP, one usually has to solve a discrete LCP in the form of Problem |1.1| at every 
time step (cf. [71 [521 ISlj)- Here, N G N denotes the length of the space grid. A 
simple example is given at the start of Section [3j 

Problem 1.1. Let A e M^^^ be an M-matrix, and let b, c& be vectors. Find 

X e M.^ such that 

Ax > b, 
X > c 

and {Ax - b)i • (a; - c)j = 0, l<i<N. 
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In this context, for z e and 1 < i < A^, (z)i is used to denote the i-th element of 
the vector z. Additionally, throughout this paper, for Z e M^^^ and 1 < « < A^, 
we will use (Z)j to denote the i-th row of the matrix Z. The definition of an 
M-matrix can be found in [HI; in particular, an M-matrix Z is non-singular with 
Z^^ > 0, i.e. every element of Z~^ is non-negative. Matrices of this form arise 
naturally from most discretisation schemes for partial differential equations. For 
the existence and uniqueness of a solution to Problem [13] see e.g. [6] or [33] . 

For more general classes of matrices A, LCPs like the one in Problem 1 1 . 1 1 have been 
widely studied from the point of view of linear and quadratic programming (cf. [S]), 
but these often make no particular use of the special structure of A arising from 
the discretisation of a differential operator (cf. [D Ej)- This makes the projected 
successive over-relaxation method (PSOR) - introduced in [6] - the most widely 
used approach in practice for finite difference matrices, in spite of its relatively slow 
convergence. 

In this paper, we aim to demonstrate that the method of policy iteration, developed 
in |10j for the numerical solution of HJB equations, yields a powerful and beautifully 
simple method for the solution of Problem Policy iteration is based on the 
interpretation of Problem |1.1| as the discrete HJB equation 

min{Ax — b,x — c} — 0, 

or, equivalently and component-wise, 

min {(f>(Ax - b), + (I - (j))(x - c) J = 0, 
0e{o,i} 

where is a control parameter: for exercise, 1 for continuation in state i. The 
availability of policy iteration as a direct algorithm for American option pricing 
does not seem to be widely known; after completing this research, however, we 
were made aware of results derived independently in [53] for "Howard's algorithm" 
(motivated by Markov decision processes, [H]), which also includes specific results 
on American option pricing. We begin by an analysis in Section [5] which is similar 
to [23] ■ for the convergence of the method in no more than N steps, and we then 
give specific examples from option pricing which demonstrate that this bound is 
sharp; numerical results in Section [3] illustrate this behaviour. We also benchmark 
this approach against PSOR (cf. (6^) and a penalty method (cf. [TT]), showing the 
competitiveness of the method in practice. 

The PSOR method is proven to converge for most matrices arising in option pricing 
applications - specifically, positive definite ones (cf. [I]) - but the number of nec- 
essary iterations grows substantially for decreasing mesh size. Projected multigrid 
methods with carefully constructed grid transfer operators, as in [2^] [13 dS] , give 
grid independent convergence rates, and practically require a similarly low number 
of iterations as the policy iteration presented here (and lower for fine meshes and/or 
large time steps), but for considerably higher implementation effort. 

The penalty method combined with a Newton-type algorithm to solve the penalised 
equation as proposed in [11] requires only the solution of a (tridiagonal) linear sys- 
tem per iteration (in one dimension) , where the number of iterations is usually very 
small in applications; this is, subject to limitations that we will discuss, inherited 
by the policy iteration presented here. In higher dimensions, the linear systems 
arising in the penalty and policy iterations can be solved efficiently by a standard 
(i.e. non-projected) multigrid method. 
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For completeness, we now briefly discuss the relation to various other methods - 
not, or only loosely based on the solution of the discrete LCP - which have been 
brought forward for pricing American options in a finite difference framework. 

In the financial industry, a standard approach is to apply the early exercise right 
'explicitly', which results in a decoupling of the two inequalities by first computing 
a continuation value, x say, i.e. the value of holding and hedging the option, and 
taking the maximum of that and the exercise value. 

Ax — b, X — max(x, c). 

Except for fully explicit timestepping schemes, where A is the identity matrix, x 
is only an approximate solution to the LCP, but it is often remarkably accurate 
in practice for sufficiently small timesteps. It is also interpretable as a Bermudan 
option approximation to the American option. 

One class of methods is based on a more sophisticated splitting of the operators, 
which approximate the continuous LCP by a sequence of simpler problems in every 
time step, rather than solving the discretised LCP directly, e.g. see [HI [19] and 
references therein; these methods are typically efficient if the underlying splitting for 
the European counterpart is accurate. However, with these approaches, there is no 
sense of solving a discrete LCP, which means that the convergence properties of the 
discretised LCP to the continuous one (e.g. see [Hl[55] and references therein) cannot 
be utilised, but convergence has to be analysed afresh for the splitted scheme. To 
our knowledge, there is no comprehensive convergence analysis for these methods 
to date. 

Another class of methods exploits explicit knowledge of the topology of the exercise 
and continuation regions, either by front-fixing (cf. |31p. method of lines coupled 
with Riccati transformations (cf. [SSlEl]), or - on the discrete level - by partition 
of the index set of the linear program as in [3] and [1] . The method proposed here 
is similar to [1] in the sense that it is a direct method for the solution of the LCP 
with a finite number of iterations, but, differing from 0], it is not dependent on 
any structure of the discrete exercise region. 

The method proposed in this paper is closely related to the policy iteration in |10] , 
adapted to the case of an early exercise option. Implementationally, it is equivalent 
to an iterative scheme for variational inequalities in ^16j and the primal-dual active 
set strategy used in [2T], a connection also observed in [23]. As we will discuss, it 
can also be seen as the limit for infinite penalty parameter of an adaptation of the 
standard penalty method in |33j to this case. Interestingly, the formal limit of the 
standard method in [TT] does not lead to a feasible policy iteration. The absence 
of any penalty parameter in the method presented here has some advantages, since 
it averts the question about the intensity of the penalisation. It is conceptually a 
simpler and arguably more intuitive approach. 

Finally, we point out that our method extends to most jump models in an obvious 
way, as it is model independent to the extend that only the M-matrix structure of 
the discretised equations matters. 

Structure of this Paper. In Section [2] we introduce policy iteration in the con- 
text of American option pricing, and we prove finite convergence in at most A'^ -|- 1 
steps, where N x N is the size of the discretisation matrices; we also prove that this 
bound is sharp and that monotonicity is an essential requirement. In Section [3| we 
present detailed numerical results. Finally, in Section |4j we discuss the relation to 
semi-smooth Newton iterations for a penalty approximation. 
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2. The Method of Policy Iteration 

In this section, we describe how to use pohcy iteration as an algorithm for solving 
Problem 1 1.1 1 We adapt algorithm and proof from [TU] to the specific situation of the 
discrete LCP considered here. We use In to denote the identity matrix in M.^^^ , 
and we consider 

{M^xh + (1 - <j>i){x)i) - {Mb)r + (1 - C^^){C)^) =0, 1 < Z < A^, 

where G argmin^g^Q {(^{Ax — b)i + (1 — 4>){x — c)^}, 1 < i < N, is an optimal 
policy in state i (but not necessarily unique); put differently, the solution x* to 
Problem II. II solves 

A*x* = b*, 

where (A*), = + (1 - 0,)(/Ar), and (b*), = 0,(6), + (1 - 0,)(c), ,l<i<N. 

Algorithm 2.1. Let x° G M^. For x" given, let 0" e M^, A" e anrf 
6" G &e suc/i that, for 1 < i < N , we have 

(</)"), G argmin{(/.(^a;" - 6), + (1 - 0)(x" - c),}, 
0e{o,i} 

(A"), - (0"),(A), + (1~(0"),)(/^), 
and ibn^ ^ ir Ubh + {I - ir hKch , 
from which follows 

(2.1) {A"x" - b'')i = ^min^ {^(Ax" - 6), + (1 - (f)){x" - c)i}. 

Find G swc/i f/iaf 

(2.2) = 

In essence, in each step of the iteration in Algorithm |2.1[ we check pointwise which 
inequality is violated the most, and solve that one with equality. (The only excep- 
tion to this is when both inequalities are non-negative, in which case we solve the 
smaller one with equality; however, as we will see in (2.3), this can only happen 
when computing x^.) 

2.1. Finite termination and linear complexity. 



Theorem 2.2. Let (a:;")^o ^e the sequence generated by Algorithm 2.1 Then 

> x", n G N, 

and = x*, n> k, 

where x* is the solution to Problem \l.l\ and k is a constant independent of x^ . 

Proof. It can be found in [9 that, since A is an M-matrix by assumption. A" as 
defined in (2.1) is an M-m atrix; this means in particular that A" is non-singular, 
and, hence. Algorithm 2.1 is well defined. Independent of n G N, there are only 2^ 



different compositions that can be assumed by A^ and 6", as follows from (2.1), 
consequently there are only 2^ different possible values for x", n €N. From (2.1) 
and (2.2) follows 

(2.3) = A"^^x" - b"-^ > min{Ax" - 6, x" - c} = - 6", 

and therefore 

^"(x"+i - x") = (yl"a;"+i - 6") - (A"x" - 5") 

> (yl"a;"+i - 6") - (A^-^x" - b"-^) ^ 0. 

The M-matrix property of A„ , (A")"i > 0, implies that x"'+^ > x"- for n G N. 
Having established that (x")^]^ is a monotone sequence in a finite set, we may set 
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K = 2 , and we can deduce the existence of a limit x* 
( [2J| ) and that x* solves Problem [L1| 



It follows directly from 
□ 



In the following corollary, we will equip Algorithm |2.1| with a tie-breaker: if, in 

)i = mm{{Ax''' 



(2.1 ), we have a tie of the form 

(Ax" - 6), = (x" 
then we set {(f>")i — 1, 
(2.4) {A"), : 

such that (A"a;" - 6"), = (Aa;" 



and (6"), = (6)„ 



Now, based on (2.4), the following corollary 



provides a much sharper bound for the maximum number of steps of Algorithm 



2.1 In the next section, we will give an example which shows that, in common 



applications, the obtained order is indeed sharp. 



Corollary 2.3. If we use the tie-breaker (2.4 1, then the result of Theorem 2.2 holds 
for K — N + 1, i.e. Algorithm\2.1\ has finite termination in at most A'^ + 1 steps. 



Proof. From (2.3 1, for 1 < i < A^, n e N, we have that 

min{(Aa;" - b), , (a;" - c) J < 0. 



(2.5) 

From (2.4) and (2.5) and the monotonicity of {x"')'^^i , for 1 < i < A^, we may 
deduce that if (a;")i > (c)^ and (A")^ = {A)i , then 

{Ax^+^-b),^0 and {x^+% > {x^),> {c), , 

which implies that (yl"+^)i — {A)i ; iterating the argument, we obtain that 

(2.6) if (x")^ > (c)j A (A"),; = {A), for n e N, then {A"')i = (A), V m > n. 
Additionally, we know that 

(2.7) {x"), > (c)i for 1 < i < TV, n > 2, 

since {x'^)i > and, if {x^)i < {c)i , then we have {Ax^ — b)i = 0, which implies 
{x^)i = {c)i . However, we also note that v4"+^ = A" for n e N U {0} imphes 
X* . Altogether, we may say that 



„n+l 



as long as Algorithm 2.1 has not yet converged, the linear system solved in 



(2.2) must change in every step, 
• and, based on (2.6 1 and (2.7 1, from x'^ onwards, every row of the linear 



system in (2.2) can change at most once before convergence, 



which means that Algorithm |2 . 1 1 must converge in no more than A^ + 2 steps. But 
the maximum number of A^ + 2 steps can only be reached if A^ — In and A* — A, 
where {A*x* — b*) — min {(Ax* — b) , {x* — c)}, which is a contradiction because 



we would have had x = x* . We may conclude that k = A^ + 1. 



□ 



We point out that, in practice, we find the performance of Algorithm 2.1 to be 



indifferent to the use of a tie-breaker as introduced in (2.4), which is not surprising 
numerically 



since 



we would expect a tie to be rare. 
2.2. Example: American Put and Other Standard Payoffs. It follows from 



the results in Section 2.1 (see also [23,) that policy iteration converges in at most 
A^ + 1 steps, where A' is the problem size. We will now show that in the valuation 
of the standard American put, the number of iterations is indeed proportional to 
N normally. Specifically, this behaviour results from the fact that for certain linear 
segments of the obstacle, as are a common feature of option payoffs, policy iteration 
moves the discrete exercise boundary only by at most one node per iteration. We 
demonstrate this on the example of the put payoff, but the result generalises to 
more general piece wise linear payoffs. 
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Proposition 2.4. Let the tridiagonal matrix A result from the finite difference 
discretisation of the Black-Scholes operator, which is only assumed to be of at least 
first order consistent. Let bi = Ci — P{Si) — 'niax{K — Si,0) the put payoff with 
strike K. Let q = maxji : 5*^ < K} and, for the exact solution x* to Problem \l . 1\ 
let e = min{i : {Ax* — b)i ~ 0} < q (a discrete exercise boundary). If x'^ — c, then 
Algorithm\2.1\ terminates in no less than q — e steps. 



Proof. We prove by induction that 

(2.8) x'^ = c, for alH < g 



where n denotes the n-th step of Algorithm 2.1 This is true for n = by choice 



of x^. Now assume that (2.8 1 holds for some n > 0. Noting that CP = rK > for 
S < K, and that any first order consistent discretisation is exact for linear functions, 
it follows that (Ax" — b)i — rK > if x'^_i — x^ — x2_^i = Ci and i < q, which is 



given for i < q — n—1 by (2.8). Hence, min{(v4a;"~5)i, {x — c)i} = {x — c)i — 0, and, 
for z < q — n — 1 = q — {n+1), we h ave ( A^)i — {lN)i and {b^)i — Ci ; consequently, 
x""*"^ = Ci for those «, which proves (2.81 for step n + 1. That is to say the discrete 



free boundary moves by no more than one index in each iteration, (Ax" — 6)e > 
for n < q — e, and the iteration terminates not before step q — e. □ 



Proposition 2.4 demonstrates that, for a put payoff, policy iteration reduces to the 
following simple method: nodes are picked systematically as candidates for the 
discrete exercise boundary, and, if the value function with this exercise strategy 
solves the linear complementarity problem, the solution is found; if not, the next 
candidate node is tested. This procedure is similar in spirit to the approaches in 
[3] and [4]. The advantage of policy iteration, however, is that it does not require 
any knowledge of the topology of exercise and continuation regions, but these arise 
as a by-product. 

Remark 2.5. In Proposition \2.4\ the number of nodes q—e between the free bound- 
ary and the strike will be approximately N{K — Se)/S„iax = 0{N), if Se is the 
exercise boundary and Smax the largest node on a uniform grid with N nodes. 

In the present setting, where the discretisation matrix A can be expected to be 
tridiagonal, the linear system in each step of Algorithm |2.1| can be solved by the 
Thomas algorithm (cf. [5]) in computational complexity 0{N). Since, by Corollary 



2.3 Algorithm |2.1| conv erges in at most N steps, the overall complex ity f or the 
solution of Problem 1 1.1 1 is then at most 0{N'^). If, additionally, Problem 1 1 . 1 1 results 
from a time discretisation with M time steps, the overall complexity for the Amer- 
ican option pricing algorithm will be at most 0{MN^). Interestingly, it can be 
shown that, for a relevant class of (monotone) discretisations, the overall number 
of policy iterations, i.e. over all time steps, is bounded by M + N, such that the 
overall complexity of the method is 0{N{N + A/)). 

Corollary 2.6. Consider a monotone, at least first order consistent scheme for 
the Black-Scholes PDE (e.g. implicit Euler, or Crank- Nicolson under a time step 
constraint, and central differences on a sufficiently fine mesh). Assume that, in 
each time step, the solution from the previous time step is used as initial value for 
the policy iteration. Then the total number of policy iterations over all time steps 
1, . . . , M is exactly q — e -\- {M — m), where e is the discrete exercise boundary for 
the American put at time step M , and ni the number of time steps in which the 
discrete free boundary has moved at least one grid point. 

Proof. That q — e + {M — m) is an upper bound can be shown as in the proof of 
Proposition 4.6 in when accounting for the fact that there may be time steps 
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where the discrete free boundary is stationary. Arguments similar to those in the 
proof of Proposition |2.4| show that it is also a lower bound. □ 

Although we clearly have implicit schemes in mind, the above results also hold for 
explicit schemes, for which the free boundary can only move one grid cell per time 
step by construction. 

2.3. The Role of Monotonicity and the M-Matrix Structure. The proof of 
Theorem |2.2| crucially requires A to be an M-matrix. For general matrices A, if 



Algorithm ^Jl is well defined and converges, it follows from (2.2) that the limit still 
solves Problem However, generally, i.e. without monotonicity, we obtain no 
more than subsequence convergence, and in that case we may not find a solution. 
We will now construct such an example. 

It is clear that finite difference matrices resulting from a central difference discreti- 
sation of non-degenerate one-dimensional linear elliptic equations are M-matrices, 
as long as the grid size is sufficiently small; the discretisation matrices, in every time 
step, of standard implicit schemes for the corresponding linear parabolic problem, 
e.g. the 6'-scheme, are then also M-matrices. The M-matrix property can thus only 
fail for large drifts and coarse meshes. For example, suppose 

(2.9) _,v=f 

where cr > and /i to be determined later. If, for the sake of this example, 
we use a four point equidistant space grid consisting of Sq , Si , S2 , S3 , where 
Sq < Si < S2 < S3 , and we have known terminal data at t = T > and Dirichlet 
boundary conditions V{So , t) = V{S3 , t) = 0, f > 0, at So and S3 , then discretising 
CV fully implicitly in time (with step size k) and with central differences in space 
(with step size h) results in a discretisation matrix 



(2.10) 



1 + ^a^Sf + r ^^cj^Sl - i,n(Si)Si 

i^a'Sl + ^t^iS2)S2 1 + Aa^Si + r 



If we pick fj, sufficiently large and such that /i(S'i) < and /i(S'2) > 0, the off- 
diagonals will be positive and the M-matrix property will fail. It is easy to check 
that, for an appropriate combination of parameters fc, h, cr, fi, Si , S2 , r and payoff 
vector c. Problem |1.1| is given by 

A^i ' and 



16 8 y ' \^ 64 

Since there are only four candidate solutions, w — (0.6,6.8)^, x = (1,5)"^, y 
(1, 6)"^ and z = (3, 5)"^, where Aw = b, x = c, 

16 8)^^(64) (0 l)'"( 5^ 

one verifies that z is the unique solution of f{z) := minjylz — b, z — c} — (0, 0)"' 

fM = ( "0 ^ ) , /(^) = ( Ii ) ' ^(^^ ^(0)' ^''"^ = ( 



Using w as initial guess. Algorithm |2.1| alternates between w and y, never finding 
the correct value z. 

The chosen 'mean-repelling' drift fj, is clearly somewhat unconventional; fortunately, 
for a constant drift as in the Black-Scholes model or mean-reverting drift /x as in the 
pricing of currency options (cf. |34| ) . say, or for any (more realistic) discretisation 
in which h is sufficiently small, the discretisation matrix ( |2.10[ ) will still turn out 
to be an M-matrix, and, thus, our theory applies once again. 
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In more than one dimension, the M-matrix property is typically lost for discretisa- 
tions of cross-derivatives, independent of the mesh size. In such cases, convergence 
cannot be guaranteed by the present theory; however, we expect this not to be 
problematic for fine enough meshes. 



3. Performance for Numerical Examples 

In this section, we compare the numerical performance of Algorithm |2.1| with two 
other approaches, namely PSOR and penalisation (see [Qj and pjj, respectively). 

We price an American put with strike in a standard Black-Scholes framework 
(cf. [7]), i.e. we have 



CV = 



dV 



1 2^2d^ 



dV 



rV 



(3.1) 

and P{S) = 

where Smax 

textbook approach (e.g. cf. [29]), using a one-sided difference in time and central 
differences in space, working with M time steps and N space steps, and solving 
backwards in time with a fully implicit scheme. This means that, for every time 



dt 2 dS 
maxjif— S", 0}, and we restrict the allowed asset range to S" G [0, S„ 
> K \s taken to be large. 



We discretise (3.11 following a standard 



step, we have to solve a discrete LCP as given in Problem 1.1 where A represents 
the finite difference discretisation matrix and is given by ( A.l ), with elements as in 



(B.l) to (B.3), h represents the solution vector known from the previous time step. 



and c represents the payoff vector. The exact parameters used in our computations 
can be f ound in Table [T] Regardless of which algorithm we use for the solution of 
Problem |l.l[ we only terminate the algorithm if we have found a vector x G 
satisfying 



N 



Ax-b 



> -toL 



and 



\{Ax^b),\ 



< tol V 



> 



1(^-4.1 



tol 



< tol 



l<i< N, 



for some given tolerance tol > 0. We implement PSOR exactly as described in 
testing all over-relaxation parameters wr G {1, 1.025, 1.05, . . . , 1.875, 1.9}, and we 
use a non-linear penalty iteration with penalty parameter p as introduced in [11) . 



r 


a 


T 


K 


Smax 


tol 


P' 


0.05 


0.4 


1 


100 


600 


le-08 


le06 



Table 1. The parameters used for the numerical computations. We use 
penalty parameter p — p' /k, where k = T/M is the size of the time step, 
to correct for the scaling implicitly resulting from the time discretisation 

(cf m)- 



3.1. Dependence on Grid Size and Time Step. The results of our computa- 
tions for the different methods are summarised in Table [2] For different grid sizes, 
the table shows the maximum number and the average number of iterations needed 
to solve the discrete LCP to an accuracy of tol at one time step, and it also includes 
the overall runtime needed to price the put when solving the discrete LCP for all 
time steps. For PSOR, w}^ denotes the over-relaxation parameter that performed 
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best for that particular grid size. (The fact that w*fi changes with the grid size and 
is not known in advance is certainly a drawback of the PSOR method.) 

We see that, for all considered grid sizes, policy and penalised iteration have very 
similar numerical performances and clearly improve over PSOR. Since, for the three 
considered algorithms, we always solve the discrete LCP at every time step to the 
same accuracy tol, the numerical results are equivalent to this accuracy, and the 
computational runtime is the only criterion that is left to be considered; hence, 
policy iteration or penalisation should be the methods of choice, with policy itera- 
tion having the (mostly conceptual) advantage of being an exact solver of the LCP 
rather than an approximation. 

Table |3] shows that the maximum number of iterations increases linearly with N if 
M is kept fixed, as predicted by the theory; it increases with ^/N, UN — M, which 
we will explain in the next subsection. 

In the light of the results in Section [2.2[ we now consider in more detail the number 
of policy iterations needed to solve a single discrete LCP. To this end, we set M = 1 



PSOR 


Max Iterations 


Iterations 


Runtime 


* 


M, N = 200 


3 


2.17 


0.23s 


1.050 


M,N = 400 


5 


2.67 


1.07s 


1.075 


M, N = 800 


5 


3.15 


4.88s 


1.175 


M = 50, TV = 800 


28 


17.00 


1.61s 


1.650 


M = 800, = 50 


1 


1.00 


0.14s 


1.000 


Penalty Method 


Max Iterations 


Iterations 


Runtime 




M, N = 200 


2 


1.06 


0.04s 




M, N ^ 400 


2 


1.06 


0.08s 




M, N = 800 


3 


1.07 


0.28s 




M = 50, TV = 800 


5 


1.82 


0.03s 




M = 800, = 50 


2 


1.00 


0.07s 




Policy Iteration 


Max Iterations 


Iterations 


Runtime 




M, N = 200 


3 


1.07 


0.03s 




M, N = 400 


4 


1.06 


0.09s 




M, N = 800 


6 


1.07 


0.29s 




M = 50, TV = 800 


18 


2.08 


0.04s 




M = 800, TV = 50 


2 


1.00 


0.07s 





Table 2. Our computational results for different numbers of time and 
space steps, denoted by M and N , respectively. We see tfiat, for all cho- 
sen yrid sizes, policy iteration and penalisation perform virtually iden- 
tically and clearly outperform PSOR. 



Policy Reration 


TV = 100 


TV = 200 


TV = 400 


TV = 800 


M = 100 


2/1.05/O.Ols 


4/1.13/0.03S 


7/1.26/0.03S 


14/1.54/0.05S 


M = 200 


2/1.03/0.02S 


3/1.07/0.03S 


5/1.13/0.05S 


11/1.27/0.08S 


M = 400 


2/1.01/0.04S 


2/1.03/0.06S 


4/1.06/0.09S 


8/1.14/0.15S 


M = 800 


2/1.01/0.06S 


2/1.02/O.lls 


3/1.03/0.16S 


6/1.07/0.29S 



Table 3. Our computational results for different numbers of time and 
space steps, denoted by M and TV, respectively. The three numbers in 
every cell represent 'Max Iterations', '0 Iterations' and Runtime'. 
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One Time Step: Number of Policy Iterations 

60| . . . . . . . ^ 




100 200 300 400 500 600 700 800 
Number of Space Steps N 



Figure 1. We fix M — 1, solving exactly one discrete LCP, and we 
consider different grid sizes N . We see that the number of policy itera- 
tions required for solving the discrete LCP is clearly linear in the matrix 
dimension N . 

and consider different grid sizes N , using c as initial guess. In Figure [ij we see that 
the number of pohcy iterations required is clearly linear in N . 

Since, in Figure [l] we used one time step only, the starting value of the iteration 
can be expected to merely be a rather coarse approximation of the solution. How- 
ever, the results of Table [2] suggest that the average number of iterations is almost 
constant if there is a good initial guess, as is the case when using a sufficiently fine 
time grid where the active set changes only for a few grid points around the exercise 
boundary; more precisely, based on Table [2] the observed complexity of American 
option pricing by penalisation and policy iteration is 0{MN), using M time steps 
and requiring 0(N) for solving a tridiagonal system. 

In Table |4j we see the average number of policy iterations required when setting 
M = N and running a fully implicit and a Crank-Nicolson scheme. Again, in 
both cases, the average number of iterations is clearly bounded (and very small), 
here suggesting that the discrete free boundary only moves a small number of grid 
cells per time step. The American option pricing algorithm thus shows complexity 
0{N'^) in practice. 

As a Crank-Nicolson central difference scheme is hoped to converge with second 
order in time and space (at least with a time step selector as in [TT] ), M — N 
is usually the natural choice. Altogether, the observed complexity of an American 
option pricing algorithm based on policy iteration (or penalisation) is seen to be 
the same as for a European pricing code. 

3.2. Dependence on Volatility. It is known from [2S] that, close to expiry, 
a^(T — t) <C 1, the exercise boundary Se behaves like 
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Policy Iteration, M = N 


100 


200 


300 


400 


500 


600 


700 


800 


Fully Implicit, Iterations 


1.05 


1.07 


1.07 


1.07 


1.07 


1.07 


1.07 


1.07 


C.N. , Iterations 


1.02 


1.03 


1.05 


1.04 


1.03 


1.03 


1.03 


1.04 



Table 4. The average number of policy iterations required per time step 
when setting M = N appears to be constant, for fully implicit as well 
as Crank-Nicolson (C.N.) time stepping. Hence, in practice, we can 
price American options m 0{N'^), which is the same complexity as for 
European options. 



M = 1, N = 200 


a = 0.2 


a = 0.4 


a = 0.8 


Policy Iteration 


6/6 


13/13 


23/23 


Penalty Method 


4/4 


6/6 


7/7 


PSOR 


39/39 


302/302 


990/990 


M = 200, N = 200 


a = 0.2 


cr = 0.4 


a = 0.8 


Policy Iteration 


2/1.03 


3/1.07 


6/1.11 


Penalty Method 


2/1.03 


2/1.06 


2/1.09 


PSOR 


2/1.99 


4/2.18 


9/2.92 



Table 5. Our computational results for the performance m dependence 
on a. The two numbers in every cell represent 'Max Iterations' and '0 
Iterations'. For M — 1, policy iteration has a clear linear dependence 
on a, penalisation is almost unaffected by the change in a, and PSOR 
seems very sensitive to a. For M — N , all three schemes show only a 
mild dependence on a . 



As the number of policy iterations is proportional (cf. Remark 2.5 1 to the displace- 
ment of the free boundary from the initial guess (typically the solution from the 
previous time step), we expect that the maximum number of iterations to be at- 
tained at the first time step, where the free boundary moves fastest, and to be 
proportional to a (cf. Table [5]). If kjh, and therefore M/N, is held constant, one 
further expects that the maximum number of iterations increases with V N , as can 
be observed in Table [3j For large enough T, the average number of iterations for all 
time steps between t = T and i = will depend weakly on a because of the upper 



bound N on the total number of iterations over all time steps (cf. Proposition 2.4 
and Table [5]). 

The penalised Newton iteration is least sensitive to changes in cr, whereas PSOR is 
affected the most. In the Appendix, we derive the asymptotic convergence rate of 
PSOR as 1 — cahP , where p = \ ioi k fixed and p = 1/2 for k ^ h. Consequently, 
the asymptotic number of iterations needed to achieve a prescribed accuracy should 
increase linearly in a, which is broadly consistent with the bottom line of Table [5] 
for the case M — N — 200. The results for M — 1 are less clear cut, which may 
be attributed to a in our example not only affecting the convergence rate, but also 
(in a very non-linear way) the distance of the solution to the initial value. 

4. The Relation of Policy Iteration to Penalisation 

Here, we discuss the relation of penalty and policy iterations for American options, 
and how they can be combined to exploit the advantages of both. 

4.1. Basic Properties. The canonical and most common penalty approximation 
for American options, used e.g. in ^.IjLj and in the numerical examples of Section [3] 
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is obtained by replacing Problem |1.1| by 

(4.1) Axp — b — pmaxjc — Xp ,0} — 0, 



for p > large. A semi-smooth Newton iteration for (4.1 1 can be defined (see [TT| ) 
as per Alorithm |2.1[ but with 

(4.2) ((/.;), e argmin{(l-(/.)(a.;-c),}, 

0e{oa} 

(4.3) (A;), = (A),+p(l-(0';),)(/^), 

(4.4) and (6^). = (6). + p (1 - (0^),)(c), . 



The number of iterations needed to solve (4.1 ) can be shown to be bounded by N, as 



for the policy iteration algorithm, and this convergence is also monotone under the 



same assumptions (see |llj). This, and most of the numerical results in Section 3.1 
suggest similar properties of the two methods, although there is a visible difference 
in the iteration numbers for large N. Indeed, the iterates generated by the two 
methods are distinctly different, even for large penalty parameter, which is clear by 
inspection of the policies and (j)" . 

One may be tempted to formally set p = oo in the above equations and define a 
different policy iteration by Algorithm |2.1[ but with the policy (j) replaced by 



(4.5) e argmin{(l-,/))(x"-c),}. 

0e{o,i} 

We will now show why this does not result in a convergent algorithm, essentially 
because the policy does not take into account the PDE constraint. First, we have 



to define a tie-breaker for the case (a;")i = (c)^ , when (4.5 1 does not uniquely define 
(^"■)i . For the choice ((/'")i = 0, the iteration may get 'stuck', i.e. (x™)^ = (c)i 
for all m > n, and will not find any solution with {x*)i > (c)^ . For the choice 



(0")i = I7 however, one observes that the solution x* to Problem 1.1 is generally 
not a fixed point of this policy iteration: if x'^ = x* and {Ax* — b)i > for some i, 
then x'^ = x* = Ci , and in the next iteration {Ax^ — b)i = 0. 

We will show in the following section that a slightly modified penalty scheme has 
policy iteration as its formal limit. 

It remains to investigate why the number of Newton steps for the penalised system 
does not grow in the same way as for policy iteration when the grid is refined, 
although the policy algorithm can be viewed as a Newton method for the LCP (see 
|23j). Here, it is helpful to take the viewpoint of [50], who analyse a semi-smooth 
Newton method for the underlying continuous variational inequality in a suitable 
function space, and show superlinear convergence. The penalty iteration above can 
be seen as discretisation of the iteration in [20] , and therefore has a well-defined limit 
for vanishing grid size. This is not true for the policy iteration, which is in some 
sense inherently discrete. For a discussion of the regularity of the iterates for the 
unpenalised (continuous) variational inequality, we refer to |20j . who also remark 
on the propagation of discrete active sets, related to the discussion in Section [Z2) 



4.2. Policy Iteration as a Limit of Penalisation. We briefiy show how policy 
iteration can be seen as limit of a different penalised iteration. 



Problem 4.1. Let A, b and c be as in Problem 4-1 Let p > be large. Find 



Xp G M such that 



N 

(4.6) Axp — b + piTnrL{Axp — b,Xp — c} = 0. 
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Algorithm 4.2. Let x° e 

Algorithm 



2.1 



and find Xp G 



For Xp given, use A" 
such that 



id 6" 



introduced 



(4.7) 



(A + pv4")x;+i =b + pb" 



Theorem 4.3. Problem 4-1 has a unique solution Xp satisfying 

II * 1 1 ^ 

ll^p ||oo ^ ~ 

for a constant C > independent of p, where x* is the solution to Problem 
Furthermore, denoting by {Xp)'^=Q the sequence generated by AlgorithmU-Sl it is 



n+l ^ 71 
Xp ^ Xp 



and Xp 

where k is the same as in Theorem\2, 



n G N, 
n > K, 



Proof. The results can easily be shown by using the same techniques as in [55] . 
where more complex penalty approximations are dealt with. □ 

Now, we have seen that Problem |4.1| is a viable penalty approximation of Problem 
and that Algorith m |4.2| has properties very similar to those of Algorithm |2.1[ 
In addition, equation (4.7 1 bears strong resemblance to equation (2.1 ), especially if 
we expect the terms multiplied by p to be dominant. 



Lemma 4.4. The sequence (Xp)5^o generated by Algorithm 4.2 is uniformly bounded 
in p, i.e. 

||a;;'||oo < C, neN, 

for a constant C > independent of p. 

Proof For n G N and 1 < i < A^, it is 

(4.8) {Ax;+' - 6),; = 

or 



(4.9) 



{Ax 



,n+l 



px^p+^ -b- pc), = 0. 



For equation (4.9), we consider two cases. First, if (Aa;^^+^ — b)i > (0;^+^ 
then we have 



c). 



(4.10) 



ix"p+' - c), < < {Ax;+' - b\ 



Second, if (Aa;;j+i - b)i < (x^+i - c), , then it is 
(4.11) {Ax;+^ -bh<0< (x^+i - c) 



Now, there is no p in equations ( |4.8[ ), (4.101 and (4.11), and we can deduce the 
existence of M-matrices A* and A** such that min{6, c} < A* 2:"+^ and A**x'p~^^ < 
max{6, c}, where all rows of A* and A** are either taken from A or In ; since there 
are only finitely many compositions that can be assumed by either of A* and A**, 
we may conclude the proof. □ 



Based on Lemma 4.4 we can now show that, given identical starting values, one 
step of policy iteration does in fact correspond to the limit p — > 00 of one step of 
penalty iteration. 

Lemma 4.5. Consider the sequences {x"')'^^q and (a;p)^o generated by Algorithms 
and 4-2. respectively. If x" = for some n £N, then 

C 



2.1 



\\x 



n+l 



< 



where C > is a constant independent of n and p. 
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Proof. We have 

{A + = b + p6" and A"x''+^ = 6", 

which imphes 

+ A"(a;"+i - = -6, 

and we get the desired result by using the uniform boundedness from Lemma 4.4 



□ 

No matter how large p is, Lemma |4.5| cannot be applied iteratively to compare the 
whole sequences generated by the two schemes since it assumes identical starting 
values. However, we can make the following instructive remark. 



Remark 4.6. Loosely speaking, the estimate of Lemma \4-5\ can be interpreted to 
relate Algorithms \2.1\ and \4.S\ in the following sense: if both algorithms use the same 
starting value x'^ and we set p — oo, then the generated sequences are identical. 

4.3. A Hybrid Method. Policy iteration has the conceptual advantage that it 
solves the discrete LCP exactly, whereas with penalisation there is an additional 
(small) penalisation error that has to be controlled. The Newton iteration for 
the penalised problem, conversely, has the advantage that it converges faster in 
practice, and the speed-up can be substantial if N is large. This is an effect of 
the mechanism by which policy iteration detects the free boundary by a pointwise 
search. This suggests a hybrid approach where we use a semi-smooth Newton 
method to solve the penalised equation and then use this solution as initial value 
for policy iteration. The following proposition shows that the total number of linear 
equation solves in this combined algorithm is only by one larger than the Newton 
steps, with the advantage that the LCP is then solved exactly. 



Proposition 4.7. If Xp is the solution of the penalised equation (4-.1), then for 
sufficiently large p policy iteration with initial value Xp converges in a single step. 

Proof. It suffices to show that the discrete continuation region of the penalty solu- 
tion {i : {xp)i > Ci\ is identical to {i : Xi> q}, i.e. 

Xi> c^ 3 pq > Oy p > po : {xp)i > a. 

But this follows from — Xp||oo < C/p (see [TT], [33]). □ 

Remark 4.8. Asymptotic analysis of the penalisation error of the standard Ameri- 
can put and its exercise boundary (e.g. see |27| ] suggests that p ^ h~^ is sufficiently 
large in Proposition \4. 7[ This choice is sensible also in terms of overall accuracy 
of the solution, if the grid convergence is 0{h^) and the penalisation error 0(1/ p). 

5. Conclusion 

We show that the method of policy iteration, devised in [lO^ for the solution of dis- 
cretised HJB equations, is a natural fit to American option pricing. It is extremely 
simple in structure, and finite convergence in at most iV -I- 1 steps can easily be 
proved. Numerical results show that, in practically relevant situations, it performs 
identically to a penalty scheme and improves over PSOR. 

The simplicity advantage of the proposed method is especially noticeable in one 
dimension, where the algorithm is a small modification of a tridiagonal linear solver; 
in this case, the overall complexity for solving the linear complementarity problem is 
0(N'^) and, if used as part of an instationary American option solver, 0{N{N +My) 
overall. In higher dimensions, the proposed method is still a direct method in the 
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sense that the basic iteration has finite termination, but the hnear systems required 
by the algorithm might most suitably be solved by an iterative solver. 

Finally, we discuss how our approach can be interpreted as the formal limit of a 
Newton-type iteration applied to a penalised equation, the advantage of the policy 
iteration being that, in the limit, the penalisation error vanishes. However, it is 
also noted that policy iteration is inherently finite-dimensional, and the number 
of iterations increases for refined meshes, whereas for semi-smooth Newton meth- 
ods for a zero-order penalty term this is not the there is an underlying 
continuous iteration which is approximated. A consequence of these two points 
combined is that although two penalty approximations to the same discrete HJB 
equation may have theoretically identical properties for fixed finite dimensions, the 
infinite dimensional limit is a helpful orientation in that it provides robustness of 
the method as the grid is refined, a point related to that made in [20] . 

Acknowledgements. We thank Mike Giles and the two anonymous referees for 
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Appendix A. Implementation for Tridiagonal Matrices 



We briefly describe how Algorithm |2 . 1 1 reduces to a simple variation of the Thomas 
algorithm (cf. [5]) if A is a tridiagonal M- matrix; discretisation matrices of tridi- 
agonal structure arise from many problems that are one dimensional in space 
(cf.[Zl[2i|32]). 

Consider the situation of Problem Suppose matrix A is tridiagonal of the form 



(A.l) 



A 



( Pi 

Q!2 



V 



7i 

^2 



72 



\ 



I 



. , ^at)"^ and c = (ci , , 



and suppose vectors b and c are given by 6 = (5i , . 
respectively. Moreover, we denote the diagonals of A by a := (a2 
/? := (/?! , . . . , PnY" 7 •= (71 : ■ ■ ■ J iN-i)^ ■ In particular, it is worth pointing 
out that A is guaranteed to be an M-matrix if /? > 0; a, 7 < 0; all row sums are 
non- negative; and there is at least one positive row sum (cf. [5]). 



The following few lines of pseudo-code correspond to an implementation of Algo- 
rithm 2.1 with starting value , solving Problem |l.l| to an accuracy of tol > 0; 
for notational convenience, we introduce ai :— 0, /3o := 1, 7o '■= Ij ^0 ■= 1 ^md 
In ■■= 0. 



1: SOLVE_LCP (a;°,^,6,c,toO 
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2: 

3: DO 

^. x°^'^ = j.'n.ew 

5: a;"'^'" = MODIFIED_THOMAS_ALGORITHM (a;"''', A, &, c) 

WHILE lla;""^"' -2;°''^||oo > tol 
RETURN a;"'^'" 
END 

9 : FUNCTION MODIFIED_THOMAS_ALGORITHM (a;°", A, 6, c) 

10: FOR i = l,...,iV DO 

11: IF {xfj^^ a, + xf'^ /S, + xf,^^ 7, - 6, < xf^ - c, ) 

12: A = 

13: /?, = A-A7,_i 

14: h = h~ A6,;_i 

15: ELSE 

16: ft = 1 , 7i = , 6i = Ci 

17: END IF 

18: END FOR 

19: a.JV^'" = bN/^N 

20: FOR i = iV - 1, ... , 1 DO 

21: a;^'^- = (6, - 7, xf^^)/ft 

22: END FOR 

23: RETURN a;"*'™ 



As we expect finite termination (of. Theorem 2.2), line 6 presents a meaningful 
test of convergence. Alternatively, we could have checked to what accuracy a;"*''" 
satisfies the LCP in Problem which we will do in Section |3] 



The traditional Thomas algorithm is merely the systematic use of Gauss elimination 
for the solution of a tridiagonal system of equations, and if lines 11, 15, 16 and 17 
were removed from MODIFIED_THOMAS_ALGORITHM, the function would simply be 
computing x"'^^ such that Ax'^'^'^ = h. Hence, if we have a European pricing 
code available, the changes we have to make to account for American exercise 
are marginal: we include the function SOLVE_LCP and slightly modify the existing 
Thomas algorithm. 



Appendix B. Convergence Rates of PSOR 

Here, we derive an estimate of the convergence rate of PSOR. Although the con- 
vergence of PSOR is well documented in the literature, and - at least for SOR 
- precise convergence rates have been derived for certain problems (typically for 
finite difference discretisations of constant coefficient PDEs), we are not aware of 
any published estimates for PSOR convergence rates for Black-Scholes-type PDEs. 



For the Black-Scholes problem, and a fully implicit Euler central difference scheme, 
matrix A in Problem |1.1| is tridiagonal as in (A.l), with 

(B.l) a, = -/j/2((T^;2 +H) 

(B.2) ft = l + k{a'^i'^ +r) 

(B.3) and = -k/2{a'^i^ - ri), l<i<N, 

where k denotes the size of the time step, and A is a strictly diagonally dominant 
M-matrix. Following [T] , PSOR is based on the splitting A = L + D + U into lower 
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triangular, diagonal, and upper triangular matrices, to define an iterative scheme 
to solve Problem by 

- c = (x" - c - ujD-^{Lx''+^ + {D + U)x'' - b))^ 

for n € N and some starting value x*', where a; > 0. It follows from Theorem 4.1 in 
[1] that the convergence rate is bounded by the spectral radius p{B), where 

B = {I^ a;D~i|L|)-i |/ - wD-^{A - L)| = (/ + i^D'^L)-^ [l - wD-^{A - L)) 

is the iteration matrix of the SOR method. Following Young's theorem, see e.g. 
[T4] , we can determine convergence by analysing the iteration matrix of the Jacobi 

iteration, 

For A as above, B'^"^"^ is tridiagonal with entries 

= kl2{(j^i^ + ri)/{l + k{a^f +r)), 
A = 

and 7j = k/2{(j'^i'^ - ri)/{l + k{a'^i'^ + r)), l<i<N, 
and it follows from Gershgorin's theorem that 
13 := piB-''"') < max {ka^i'^)/{l + k{cr'^i^ + r)) = {ka^N^)l{l + k{(j^N^ + r)). 

1 < ? < A'^ 

If we let kN'^ — )■ oo, 

P<^^^.^.^Oik-^N-% 

If kN = Smaxk/h is fixed, (3 < 1 — / {%a'^)h + 0(h?) for some c > 0, chosen in this 
form for notational convenience later. If k is fixed, (3 = 1 — (? /{ia'^)h? + 0(/i*). We 
also see that the convergence rate deteriorates with increasing a. Using Young's 
theorem, 



, „x f 1 - W + _^ w/3^1_w + w2^2/4^ < < LOopt 

> \ w - 1, ujopt < w < 2, 

where ^ 

"°^* = i + v/r^' 

and it follows that 

i^opt < 2 - -\/h + 0{h). 
a 

If the leading terms on the right-hand side are chosen for w, 

c 
a 

For fixed, the number would be p{B) <\~^h. As a consequence, to reduce the 
error by a factor e, the number of required iterations is asymptotically 

if k/h is fixed, and Nn < ccr if k is fixed. A more refined analysis would show 
that the order of these upper bounds is also sharp. 
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P{B) < 1 - -Vh. 



